8  Data Structures in R - Practical

8.1 Demonstration: Working with four basic data structures

Type One: Matrices

Like a spreadsheet, but must contain same variable types in all elements.

In this example, we create two matrices:

rm(list=ls()) # this code cleans my environment

data <- c(1, 2, 3, 4, 5, 6) # create data
matrix_1 <- matrix(data, nrow = 2, ncol = 3) # created the first matrix by arranging the data vector into 2 rows and 3 columns
matrix_2 <- matrix(data, nrow = 3, ncol = 2) # another matrix, with 3 rows and 2 columns

# print these to console window
print(matrix_1)
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
print(matrix_2)
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

We use square brackets [ ] with row and column indices to access elements in a matrix.

first_row_second_column <- matrix_1[1, 2] # note that this assumes you ran the previous code to create matrix_1!
entire_second_row <- matrix_1[2, ]
entire_third_column <- matrix_1[, 3]

print(first_row_second_column)
[1] 3
print(entire_second_row)
[1] 2 4 6
print(entire_third_column)
[1] 5 6

We can modify (add or update) elements in a matrix by assigning values using row and column indices:

matrix_1[1, 1] <- 42 # this changes the element at row 1, column 1 to 42
matrix_2[, 2] <- c(7, 8, 9) # this changes the contents of column 2 to 7,8,9

print(matrix_1)
     [,1] [,2] [,3]
[1,]   42    3    5
[2,]    2    4    6
print(matrix_2)
     [,1] [,2]
[1,]    1    7
[2,]    2    8
[3,]    3    9

We can also perform arithmetic and logical operations on matrices, such as element-wise addition, subtraction, multiplication, and division:

A <- matrix(c(1, 2, 3, 4), nrow = 2) # create our first matrix
B <- matrix(c(5, 6, 7, 8), nrow = 2) # create our second matrix
sum_matrix <- A + B # create a new vector which is the sum of the two matrices
product_matrix <- A * B # create a new vector which is the product of the two matrices

print(sum_matrix)
     [,1] [,2]
[1,]    6   10
[2,]    8   12
print(product_matrix)
     [,1] [,2]
[1,]    5   21
[2,]   12   32

We can apply functions to matrices to perform various operations, such as calculating the transposed matrix, row and column sums, and more:

transpose_matrix <- t(A) # this transposes matrix A
row_sums <- rowSums(A)
col_sums <- colSums(A)

print(transpose_matrix)
     [,1] [,2]
[1,]    1    2
[2,]    3    4

Use the * operator to perform matrix multiplication (not element-wise):

multiplied_matrix <- A * t(B)

print(multiplied_matrix)
     [,1] [,2]
[1,]    5   18
[2,]   14   32

Type Three: Lists

Lists are used to store and organise a collection of elements. Unlike vectors and matrices, lists can store elements of different data types and structures.

We can use the list() function to create a list by combining elements:

rm(list=ls()) # this code cleans my environment

simple_list <- list(42, "celtic", TRUE)
nested_list <- list(number = 42, text = "hello", vector = c(1, 2, 3), matrix = matrix(1:4, nrow = 2))

When you run this code, look at your environment window, and click on [nested_list]. Do you see what the code above has created?

We can use double square brackets [[ ]] or the dollar sign with an index or a name to access elements in a list:

first_element <- simple_list[[1]] # access using index
named_element <- nested_list$text # access using name
third_element <- nested_list$vector # access using name

We can add, update, or remove elements by assigning values using indexing or names:

simple_list[[2]] <- "banana"
nested_list$new_element <- "Morton are great!"
nested_list$number <- NULL # removes the 'number' element

We can also perform operations on elements within a list using indexing or names to access them:

sum_vector <- nested_list$vector + c(4, 5, 6)
new_matrix <- nested_list$matrix * 2

We can apply functions to lists to perform different operations, such as calculating the length of the list or extracting specific elements from it:

list_length <- length(simple_list) # returns the list length
first_two_elements <- simple_list[1:2]  # returns the first two elements of the list

We can convert a list to other data structures using functions such as unlist(), as.data.frame(), or as.matrix(), as long as the list’s structure permits it:

simple_list <- list(1, 2, 3)
vector_from_list <- unlist(simple_list) # create a vector from a list
print(vector_from_list)
[1] 1 2 3
nested_list <- list(list(1, 2), list(3, 4, 5))
dataframe_from_list <- as.data.frame(nested_list) # create a dataframe from two lists
print(dataframe_from_list)
  X1 X2 X3 X4 X5
1  1  2  3  4  5

Type Four: Data Frames

Data frames are similar to matrices, but can store columns of different data types, making them ideal for handling datasets with mixed data types.

We use the data.frame() function to create a data frame by combining vectors or other data structures as columns:

rm(list=ls()) # this code cleans my environment

names <- c("Scotland", "England", "Wales") # create a vector of names
ages <- c(25, 30, 22) # create a vector of ages
heights <- c(165, 180, 172) # create a vector of heights
data <- data.frame(Name = names, Age = ages, Height = heights) # this creates a dataframe called [data], which includes all three vectors

print(data)
      Name Age Height
1 Scotland  25    165
2  England  30    180
3    Wales  22    172

As with matrices, we can use square brackets [ ], double square brackets [[ ]], or the dollar sign with row and column indices or names to access elements, rows, or columns in our data frame.

For example:

first_row <- data[1, ]
age_column <- data$Age # note how we refer to a specific vector (variable) within the dataframe
third_row_second_column <- data[3, "Age"]

We can add, update, or remove elements, rows, or columns by assigning values using indexing or names.

data$Name[1] <- "Alicia"     # change an element
data$Weight <- c(60, 85, 75) # add a new column
data[4, ] <- c("David", 23, 185, 80) # add a new row
data$Weight <- NULL # Remove the 'weight' column

We can also perform operations on elements, rows, or columns within a data frame using indexing or names to access them:

data$Age <- as.numeric(data$Age) # we need to convert data$Age to a numeric variable type
avg_age <- mean(data$Age) # we can then do some calculations on it
tall_people <- data[data$Height > 175, ]

We can apply functions to data frames to perform various operations, such as calculating the dimensions, extracting specific elements, and more:

num_rows <- nrow(df) # this function (nrow) tells us how many rows are in our data frame
num_columns <- ncol(df)
column_names <- colnames(df)
row_names <- rownames(df)

We can use logical conditions, column indices, or column names to filter or subset data frames:

adults <- data[data$Age >= 18, ]
name_age <- data[, c("Name", "Age")]

We can also use this approach to remove a variable from a data frame:

data_02 <- subset(data, select = -c(Age)) # creates a new data frame without variable [Age]

Type Five: Tibbles

Tibbles offer several improvements over data frames, such as better printing in the console, the ability to handle column names with special characters or spaces, and automatic data type detection.

Tibbles are an integral part of the tidyverse package and work well with other tidyverse functions and packages.

rm(list=ls()) # this code cleans my environment

library(tidyverse) # assumes you've installed tidyverse!
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

We can use the tibble() function to create a tibble, by combining vectors or other data structures as columns:

names <- c("Alice", "Bob", "Charlie")
ages <- c(25, 30, 22)
heights <- c(165, 180, 172)
tb <- tibble(Name = names, Age = ages, Height = heights)

We can use the as_tibble() function to convert an existing data frame to a tibble:

df <- data.frame(Name = names, Age = ages, Height = heights)
tb <- as_tibble(df)

Similar to data frames, use square brackets [ ], double square brackets [[ ]], or the dollar sign with row and column indices or names to access elements, rows, or columns in a tibble:

first_row <- tb[1, ]
age_column <- tb$Age
third_row_second_column <- tb[3, "Age"]

We can add, update, or remove elements, rows, or columns by assigning values using indexing or names:

tb$Name[1] <- "Alicia"
tb$Weight <- c(60, 85, 75) # Add a new column
tb <- add_row(tb, Name = "David", Age = 23, Height = 185, Weight = 80) # Add a new row
tb$Weight <- NULL # Remove the 'Weight' column

We can perform operations on elements, rows, or columns within a tibble using indexing or names to access them:

avg_age <- mean(tb$Age)
tall_people <- tb[tb$Height > 175, ]

We can apply functions to tibbles to perform various operations such as calculating the dimensions, extracting specific elements, and more:

num_rows <- nrow(tb)
num_columns <- ncol(tb)
column_names <- colnames(tb)
row_names <- rownames(tb)

We can use logical conditions, column indices, or column names to filter or subset tibbles:

adults <- tb[tb$Age >= 18, ]
name_age <- tb[, c("Name", "Age")]

8.2 Practice: Working with the five basic data structures

Task 1: Matrices

  1. Create and Modify: Create a 4x4 matrix with elements from 1 to 16. Then change the element in the third row, second column to 100.
  2. Element Access and Operations: Extract the second column from your matrix. Calculate the sum of the elements in this column.
  3. Arithmetic Operations: Add 5 to each element of the entire matrix. Then, create another 4x4 matrix of random numbers and find the element-wise product of the two matrices.
  4. Function Application: Calculate and print the row sums and column sums of the final matrix you obtained in the previous step.

Task 2: Lists

  1. Create and Access: Make a list containing a numeric vector, a character vector, and a logical vector. Access and print the second element of the list.
  2. Update and Modify: Add a new element which is another list containing three character elements. Update the first element of the outer list to be twice its original values.
  3. Operations on List Elements: From the nested list you added, extract the second element and concatenate it with the first element of the main list.

Task 3: Data Frames

  1. Creating Data Frames: Create a data frame with three columns: [ID] (1-5), [Temperature] (random numbers representing temperature), and [Status] (character strings of different weather conditions).
  2. Access and Modify: Extract the [Temperature] column using two different methods. Increase all temperatures by 3 degrees.
  3. Logical Operations: Filter out rows where the [Temperature] is above a certain threshold (you decide the value) and print these rows.
  4. Add and Remove Columns: Add a new column [AdjustedTemp] which is the original temperature plus 10. Then, remove the Status column.

Task 4: Tibbles

  1. Create Tibbles: Convert the data frame you created in Task 3 into a tibble.
  2. Modify and Access: Replace the first row with new data of your choosing. Then extract and print rows where [AdjustedTemp] is greater than a certain threshold.
  3. Operations: Calculate the average of [ID] and print it. Find all rows where [ID] is less than 3 and print them.

General Task

Conversion: Convert the tibble back into a data frame, then into a list, and finally convert this list into a vector (if possible). Discuss the outputs at each step, noting any data loss or changes in structure.

8.3 Possible Solutions

Task 1: Matrices

Create and Modify

Show the answer
mat <- matrix(1:16, nrow=4)
mat[3, 2] <- 100
print(mat)

Element Access and Operations

Show the answer
second_column <- mat[, 2]
sum_second_column <- sum(second_column)
print(sum_second_column)

Arithmetic Operations

Show the answer
mat <- mat + 5
random_mat <- matrix(runif(16, 1, 10), nrow=4)  # Random numbers between 1 and 10
product_mat <- mat * random_mat
print(product_mat)

Function Application

Show the answer
row_sums <- rowSums(mat)
col_sums <- colSums(mat)
print(row_sums)
print(col_sums)

Task 2: Lists

Create and Access

Show the answer
my_list <- list(numeric_vector = 1:5, character_vector = c("one", "two"), logical_vector = c(TRUE, FALSE))
print(my_list[[2]])

Update and Modify

Show the answer
my_list$nested_list <- list("apple", "banana", "cherry")
my_list$numeric_vector <- my_list$numeric_vector * 2
print(my_list)

Operations on List Elements

Show the answer
new_vector <- c(my_list$nested_list[[2]], my_list$character_vector[1])
print(new_vector)

Task 3: Data Frames

Creating Data Frames

Show the answer
df <- data.frame(ID = 1:5, Temperature = runif(5, 20, 30), Status = c("Sunny", "Rainy", "Cloudy", "Windy", "Snowy"))
print(df)

Access and Modify

Show the answer
temp_col1 <- df$Temperature
temp_col2 <- df[, "Temperature"]
df$Temperature <- df$Temperature + 3
print(df)

Logical Operations

Show the answer
filtered_df <- df[df$Temperature > 25, ]
print(filtered_df)

Add and Remove Columns

Show the answer
df$AdjustedTemp <- df$Temperature + 10
df$Status <- NULL
print(df)

Task 4: Tibbles

Create Tibbles

Show the answer
tb <- as_tibble(df)

Modify and Access

Show the answer
tb[1, ] <- tibble(ID = 6, Temperature = 26, AdjustedTemp = 36)
high_temp <- tb[tb$AdjustedTemp > 30, ]
print(high_temp)

Operations

Show the answer
avg_id <- mean(tb$ID)
print(avg_id)
small_ids <- tb[tb$ID < 3, ]
print(small_ids)

General Task

Conversion

Show the answer
df2 <- as.data.frame(tb)
list_from_df <- as.list(df2)
vector_from_list <- unlist(list_from_df)
print(df2)
print(list_from_df)
print(vector_from_list)